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Abstract 

We explore the performance of a statistical learning technique based on Gaussian Process (GP) 
regression as an efficient non-parametric method for constructing multi-dimensional potential en¬ 
ergy surfaces (PES) for polyatomic molecules. Using an example of the molecule N 4 , we show that 
a realistic GP model of the six-dimensional PES can be constructed with only 240 potential energy 
points. We construct a series of the GP models and illustrate the convergence of the accuracy of 
the resulting surfaces as a function of the number of ab initio points. We show that the GP model 
based on ~ 1500 potential energy points achieves the same level of accuracy as the conventional 
regression fits based on 16,421 points. The GP model of the PES requires no fitting of ab initio 
data with analytical functions and can be readily extended to surfaces of higher dimensions. 
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INTRODUCTION 


Accurate potential energy surfaces (PES) describing the interactions of atoms in poly¬ 
atomic molecules are required for calculations of the ro-vibrational energy levels and the 
dynamical properties of molecules. While classical dynamics calculations may use local 
values of the potential energy and its gradient, quantum dynamics calculations generally re¬ 
quire the knowledge of the global PES. Obtaining the global PES for a polyatomic molecule 
requires quantum chemistry calculations of the potential energy at different molecular ge¬ 
ometries and the construction of an analytical ht to interpolate the computed energies. 
The resulting PES must be continuous, smooth, differentiable and free of unphysical varia¬ 
tions. As the complexity of the molecule increases, producing an analytical £t that satisfies 
these requirements becomes exceedingly difficult [l|. The difficulty arises from (i) the need 
to calculate the potential energy at a large number of coordinates; (ii) the complexity of 
the analytical functions and fitting procedures necessary for representing multi-dimensional 
surfaces. 

Several methods have been developed recently to reduce the difficulty of htting PESs for 
polyatomic molecules. For example, an n-mode representation was proposed to construct a 
PES as a series of intrinsic potentials accounting for a small number of normal modes Mi. 
An alternative approach - or one that can be used in combination with the n-node represen¬ 
tation - is to take advant^e of the symmetry of multi-dimensional PES under permutations 
of identical atoms [l, 5, 6|. The goal of these approaches is to £t a high-dimensional PES 
by fitting functions of lower dimensionality. However, both of these approaches are complex 
and, therefore, difficult to implement; are molecule-specihc (i.e. the htting procedure and 
the choice of analytical functions are different for different molecular species and different 
number of active degrees of freedom); and meet with the same challenges as the conventional 
fitting procedures when the dimensionality of the PES increases. 

In order to construct accurate PES for large polyatomic molecules or molecular complexes, 
it is desirable to develop alternative approaches that would be 


(i) easy to implement; 

(ii) scalable; 

(hi) universal (be the same for any molecule and any number of dimensions); 


2 










(iv) efficient (i.e. using a small number of ab initio calculations per dimension). 


This can be achieved by combining quantum chemistry calculations with machine-learning 
techniques developed for efficient interpolation of multi-dimensional spaces Q. There are, at 
least, two machine-learning methods that can be used for obtaining high-dimensional PESs: 
an approach based on artificial neural networks js] and Gaussian process regression {d]. The 


ap 


ication of artificial neural networks to fitting PESs has been explored in several studies 


16j . Gaussian process (GP) models have been proposed for constructing force-fields 


in Refs. 


17 


2l| and for a variety of applications in molecular collision dynamics in Refs. 


22I. l23j . However, to the best of our knowledge, the accuracy and efficiency of GP regression 


for obtaining a global PES for a polyatomic molecule have not been systematically assessed 
before. It is generally not known how many ab initio points are required for a GP regression 
to produce a physical PES and whether the error of GP fits can be reduced to the accuracy 
of ab initio calculations. 

In the present work, we explore the efficiency of Gaussian processes for interpolating 
multi-dimensional PESs by producing a series of GP models of the six-dimensional PES for 
N 4 . We use the potential energy data computed by Paukku et ah [24] to train the GP models 
and explore the accuracy of the resulting surfaces as a function of the number of training 
points. This PES covers a wide range of energies between 0 and 1000 kcal/mol and exhibits 
a complex energy landscape. We show that a realistic model representing accurately the low- 
energy part of the 6D PES can be obtained with only 240 quantum chemistry calculations. 
We illustrate that the GP model based on between 1200 and 1800 potential energy points 
achieves the same level of accuracy as the conventional regression fits based on 16,421 points, 
capturing the features of the PES even at energies as high as 1000 kcal/mol. 


WHY GAUSSIAN PROCESSES FOR FITTING PES? 


Given the abundance of literature on the application of the artificial neural networks to 

nn 

fitting PESs [10H16| , why to explore an alternative statistical learning technique? Gaussian 
processes offer, at least, five major advantages for obtaining PESs: 


(i) GP regression is a kernel-based statistical learning technique and, as such, is generally 


easier to implement than the artificial neural networks 


17| 
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(ii) There is no need to fit any computed data with analytical functions 


0 , 


25 


2^. A 


GP model is determined by correlations between potential energy points in a multi¬ 
dimensional configuration space and provides a statistical prediction for the value of 
the potential energy within the configuration space. The correlations do not need to 
be known exactly; rather, one uses the best estimates of the correlations parametrized 
by a simple analytical function. 

(iii) With a proper parametrization of correlations between the energy points, a GP model 
is guaranteed to yield a smooth and differentiable surface that passes through the po¬ 
tential energy points used for training the model 


27 


30j . The choice of the correlation 


function controls the differentiability of the resulting surface. 


(iv) The GP models scale favorably with the dimensionality of the problem. A rule of 
thumb is that a GP model should require on the order of 10 x g points for interpolating 
a surface with q dimensions 3l|. This implies that a 6D PES can be obtained with 
only 60 quantum chemistry calculations. The purpose of this work is to test this 
rule of thumb in application to fitting the molecular PESs, typically characterized by 
smooth but wide variation of energy and the presence of multi-dimensional minima 
and barriers in the intramolecular coordinate space. Artificial neural networks are 


expected to require many more ab initio points 


16|. 


(v) GP models are guaranteed to become more accurate when trained by more quantum 
chemistry calculations 31|. 


As will be clear from the discussion in the following section, the numerical effort associated 
with training a GP model is 0{n^), where n = {p x q)^ is the number of ab initio training 
points, q is the dimensionality of the PES and p is the effective number of ab initio points 
per dimension. Once a GP model is trained, the evaluation of the potential energy using 
the GP model is reduced to a product of a square matrix and a column vector with the 
dimension equal to the number of training points. Thus, the numerical efficiency of the 
PES evaluation based on a GP model is For applications requiring the evaluation 

of the PES at a large number of intramolecular coordinates, this may become a bottleneck 
when px q K, 10^ - 10^. One should, therefore, expect the GP regression to be a method of 
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preference for dimensions q < 100 and the artificial neural network approach - for problems 


with large q > 100. 


FITTING PES WITH A GAUSSIAN PROCESS 


We will denote a GP by F(-). A GP is a family of normally distributed random multi¬ 
dimensional functions characterized by a mean function /i(-) and a covariance function 
K{-, ■). For a GP with constant variance the covariance function is K{-, ■) = cr‘^R{-, •), 
where is a correlation function. A point in a multi-dimensional space of interest is 

specihed by a vector x. A realization of a GP at a given x is the value of a random function 
drawn from the normal distribution and evaluated at x. Multiple outputs of a GP at the 
same x produce a Gaussian distribution of values F{x). 

le GP models to molecular dynamics problems has been described in 


The application 

of t 

le ( 

our previous work 

22, 

23] 


molecule are given by a g-dimensional vector x = (xi, ...^Xq)' . Our goal is to construct the 


global PES, given n potential energy values at vectors Xi, 


I 


We model the collection 


of n points in the g-dimensional space by a GP and assume that each of these potential 
energies V{xi) computed by a quantum chemistry method is a realization of a GP at Xi. 
The multiple outputs of a GP at the given n points = {^F{xi),F{x 2 )., - ■ ■ ,F(xn)^ 
follow a multivariate normal distribution 


~ MVN(/3, a^A) 

where (3 is the mean vector and A is a n x n matrix dehned as 

/ 1 .W ... ^ A 


( 1 ) 


A = 


1 R{Xi^X 2) ■■■ R{Xi^Xr 

R{x2,Xi) 1 : 


( 2 ) 


y R{xn^ ®i) ■ ■ ■ 1 y 

For simplicity, and without loss of generality, we assume that (3 = /3I, where I is the identity 
vector of dimension n and f3 is an unknown scalar parameter. 


We describe the correlation function i?(-, •) by the following expression 27H30|]: 
1 


R{x, a/) = <( 

. i=l 


^ ^ \/b\xi-x[\ ^ - x'^f\ ( y/b\xi-x'i\ 

I I ^ r) I GXp I 






Ui 


( 3 ) 
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where Ui are the unknown parameters representing the characteristic leng 


correlations. Eq. 


is a special case of the Matern correlation function 

R{x, *) = 


27 


1 scales of the 


30 | defined as 

(4) 


where di = ^/^\xi — x'l/cuj and is the modified Bessel function of order u. Eq. ([3]) 

represents the Matern correlation function with v = 5/2. The mathematical form of the 
correlation function determines the properties of the resulting GP. In particular, it deter¬ 
mines the existence of its derivatives. For example, if the correlation function is chosen to 
be a Gaussian 27H30|. the GP is differentiable to any order. With the Matern correlation 


function (|1]), the process is differentiable to order k < u. Thus, when the parameter u = 5/2, 
the GP is twice differentiable. 

To hnd the parameters uj = (a;i, 1 x 2 , ■■ ■ , ujqY of the correlation function making the pre¬ 
dicted potential energy points “most likely”, known in statistics as the maximum likelihood 
estimators (MLE), we maximize numerically the log-likelihood function 


log£(a;|l"”') = -- [nloga^ + log(det(A)) n] , 


where 


n 


/3)TA-i(r-_/3), 


i3{uj) = (rA"^I)-irA"iF", 


(5) 


( 6 ) 


(7) 


and the hat over the symbol denotes the MLE. 

The goal is to make a prediction of the potential energy value at an arbitrary position 
X = Xq. The values Yq = F[xq) obtained by multiple realizations of the GP at Xq and the 
multiple outputs of the GP at training sites Y'^ = ^F(a:i), F{x 2 ), • • • , F{xn)j are jointly 


distributed as 



MVN 



An A 



( 8 ) 


where Aq = {R{xo, cci), R{xq, X 2 ), • • • , R{xo, Xn)Y is a column vector specihed by the cor¬ 
relation function i?(-|u>) with the MLE of u). This means that the conditional distribution 
of values Iq = F{xq) given the values Y'^ is a normal distribution with the conditional mean 


jx{x,)=p + A^A-\Y--(3) 


(9) 
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and the conditional variance 


a‘^{xo) = 0-^(1 - Aq A ^Ao). (10) 

We use Eq. (E]) as a prediction of the value of the potential energy at Xq and Eq. flTUD as 
the uncertainty of the prediction. 


RESULTS 


Following Ref. 2^, we illustrate the quality of the PES model by Figures 1-2 showing the 
potential energy curves for the dissociation N 4 —)■ N 3 -|- N at different values d of the distance 
between the centers of mass of the two nitrogen molecules. The different panels of Figures 
1-2 correspond to different geometries of the N 4 complex, cove ring the A-shaped, T-shaped, 
//-shaped and A-shaped sets illustrated in Figure 3 of Ref. [2^. The symbols in Figures 
1-2 represent the original potential energy data and the curves - the values computed by 
the GP models. 

In principle, the potential energy points in vector used for training the GP model 
can be chosen at random conhgurations of the molecule Xi. However, given a hxed number 
of training points, the accuracy and the stability of the GP model can be improved if the 
points are selected to cover evenly the conhguration space of interest. This can be achieved, 
for example, with the Latin hypercube sampling (LHS) method 32| known to cover a multi¬ 
dimensional space efficiently. In order to improve the efficiency of the GP models, we use 


the following sampling technique for the results shown in Figures 1-2. Paukku et ah 


24\ 


computed 15363 points for 9 combinations of internal angles (9 sets of geometries) for N 4 
and 1056 points for 77 combinations of internuclear distances for N 3 . We hrst treat each 
combination of three input variables as a single variable to get a reduced 4D PES. We then 
generate an n x 4 LHS matrix with values uniformly distributed on [0,1], and transform 
elements in each row to sample quantiles corresponding to the empirical distribution of input 
variables in the reduced 4D PES. If the sampled conhguration is not contained in the data 


set of Paukku et ah 


24| . we use the nearest available conhguration. It would be better to 


use the LHS in six dimensions. However, that would lead to a lot of conhguration points 
not contained in the data of Paukku et ah 2^. The sampling technique used here requires 


a much smaller number of adjustments to the sampled conhgurations. 
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Figure 1 is obtained with only 240 potential energy points. Given the small number of ab 
initio points, the agreement between the ab initio data (symbols) and the results of the GP 
model is remarkable. Figure 1 illustrates that 240 potential energy points produce a htted 
GP model representing accurately the low-energy part (< 200 kcal/mol) of the global 6D 
PES and giving a qualitatively correct representation of the PES at high energies (> 200 
kcal/mol). 

The results of Figure 2 (obtained with 1200 potential energy points) show that increasing 
the number of potential energy points improves the quality of the model both at low energies 
and in the high energy region of the surface. In order to quantify the accuracy of the GP 
models of the PES and compare the result with those in Ref. 2^, we calculate the mean 
unsigned error (MUE) and the root-mean-square error (RMSE) for a series of GP models 
obtained with different numbers of potential energy points. The errors are dehned as 


1 


MUE=-V|^, 

n 


Vi\ 


( 11 ) 


i=l 


RMSE = 


\ 


-Eife 

n “ 


!/iP. 


( 12 ) 


i=l 


where iji is the value of energy obtained from the GP model and yi is the corresponding 
value of energy from the data of Paukku et ah 2^. To ensure that the accuracy of the GP 
models shown in Figures 1 - 2 is not accidental, we construct up to 100 different GP models 
for each number of ab initio points summarized in Table I and compute the average errors 
with the corresponding standard deviations of the errors. The number of the GP models was 
chosen to ensure convergence of both the mean values and the standard deviations reported 
in Table I. 


Table I illustrates that the GP model trained by 1800 potentia 
the PES with the similar accuracy as the htting procedure of Ref. 


energy points produces 


24| based on polynomial 


regressions and 16421 potential energy points. Table I also illustrates the monotonous 
improvement of both the model accuracy and the model stability as the number of potential 
energy points increases. The accuracy of the GP model increases by almost 40 % as the 
number of potential energy points increases from 240 to 480, and another 50 % as the 
number of points increases from 480 to 1800. The increase of the stability is evident from 
the decrease of the standard deviations. The data in Table I illustrate that the low-energy 










part of the PES corresponding to E < 100 kcal/niol can be represented with the RMSE < 2 
kcal/mol by a GP model with only 720 potential energy points. 


CONCLUSION 


We have constructed a series of Gaussian Process models of the global 6 D potential energy 
surface for the molecule N 4 and showed that an accurate surface spanning a wide range of 
energies between 0 and 1000 kcal/mol can be obtained with only ~ 1500 potential energy 
points. The number of potential energy points required for an accurate GP model of the 
PES can be decreased by selecting the points to follow a 6 D LHS |32| instead of a quasi-LHS 
used here and by using regression functions to approximate the unconditional mean function 
/i(-). For example, the mean of the GP can be modelled as 


/^(*) = = h(a;)^/3 

1=1 


(13) 


where h = (hi (a;), ..., hs{x))^ is a vector of s regression functions 25|, [2^ chosen to mimic 


the dependence of the PES on the corresponding variables. Since the variation of the PES 
with the individual bond lengths is often nearly quadratic at low energies and follows simple 
power laws in the limit of bond dissociations, it should often be possible to hnd suitable 
regression functions hj to optimize the GP model training. We note that the hnal results 
do not depend on the specihc form of the functions hs{x), as is clear from the calculations 
in the present article based on the simple choice of hi = 1 and hs>i = 0 . 

Our results have two important implications. First, a GP model, even with the simple 
choice of hi = 1 and hs>i = 0 , requires much fewer potential energy calculations for the 
construction of an accurate global potential surface than conventional htting techniques 
based on polynomial regressions. Second, the extension of the GP model discussed here to 
surfaces of higher dimensions is straightforward. Adding another dimension simply requires 
an extension of the expression for the correlation function to include another term in the 
product ([3]). Adding another dimension also requires more training points in the input 
vector Y"'. However, it is known that the number of additional training points required per 
added dimension decreases with the number of dimensions 3l|]. The numerical procedure 
of training the GP model is limited by the optimization of the log-likelihood function (j5j), 
involving iterative computations of the determinant and inverse of the matrix A. The current 
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limit on the number of dimensions amenable to GP modelling is limited to about 100 33| and 
there is currently active research on extending the applications of GP models to problems of 
higher dimensionality 3^. It is thus foreseeable that the GP model discussed here can be 
applied for constructing the PES for molecules with up to 100 internal degrees of freedom. 
Such a PES model would require an estimated 10,000 to 30,000 potential energy points, 
which is currently well within reach of state-of-the-art quantum chemistry calculations. 
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TABLE I. The mean unsigned errors (MUE in kcal/mol) and root-mean-square errors (RMSE in 
kcal/mol) of the Gaussian process models of the PES for N 4 trained by the different numbers of 
potential energy points. The errors of the Gaussian process models are obtained by averaging over 
up to 100 models with different samples of the potential energy points and reported with one unit 
of the corresponding standard deviations. The results in columns 3 and 4, labeled “low E”, are for 
the potential energy E < 100 kcal/mol. The results in columns 5 and 6 , labeled “all E”, give the 
errors of the global PES over the entire range of energies extending to E > 1000 kcal/mol. 



Number of points 

MUE (low E) 

RMSE (low E) 

MUE (all E) 

RMSE (all E) 

PES of Ref. [24j 

16421 

1.3 

1.8 

5.0 

14.3 

GP model 

240 

4.43 ± 1.37 

6.39 ± 1.77 

13.97 ± 2.02 

38.00 ± 7.92 

480 

2.61 ± 0.99 

4.19 ± 1.56 

9.68 ± 1.05 

28.67 ± 3.92 

720 

1.72 ± 0.52 

2.96 ± 0.78 

7.72 ± 0.78 

24.02 ± 3.15 

960 

1.39 ± 0.34 

2.58 ± 0.67 

6.57 ± 0.77 

21.57 ± 2.78 

1200 

1.23 ± 0.28 

2.26 ± 0.49 

5.76 ± 0.59 

19.46 ± 2.75 

1800 

0.96 ± 0.23 

1.81 ± 0.46 

4.45 ± 0.57 

15.87 ± 2.72 

2400 

0.74 ± 0.19 

1.50 ± 0.37 

3.83 ± 0.53 

14.33 ± 2.38 
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FIG. 1. PES for N 4 represented by the GP model trained with 240 potential energy points from 


Ref. 


24( 1 . The curves represent the GP model and the symbols ~ the ab initio data. The variable 


vb is the interatomic distance in one of the N 2 molecules. The interatomic distance of the other 
molecule is fixed to the equilibrium distance. The different curves correspond to different values of 
the separation d between the centers of mass of the two molecules. The different panels correspond 
to different geometries of the N 4 complex: (a) ^-shaped; (b) T-shaped; (c) fl-shaped; (d) X- 


shaped. These geometries are illustrated in Figure 3 of Ref. 


M- 
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FIG. 2. Same as in Figure 1 but for the PES for N 4 represented by the GP model trained with 


1200 potential energy points. 
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